Fix Energon data loading incompatibility with updated Qwen3-VL finetuning pipeline#2680
Fix Energon data loading incompatibility with updated Qwen3-VL finetuning pipeline#2680aub123 wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
Conversation
…ng pipeline Signed-off-by: aub123 <2546319206@qq.com>
2530b0e to
9603399
Compare
📝 WalkthroughWalkthroughUpdates image and video handling in a Qwen VL model example by introducing a new tensor conversion method in the task encoder and correcting the file extension mapping in the dataset documentation from Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
src/megatron/bridge/recipes/qwen_vl/data/energon/task_encoder.py (2)
216-216:extensions_mappingis now unused.The
self.extensions_mappingdictionary is no longer used after this refactor. Previously it was used to map extensions to keys forimagehandler, but now_convert_to_tensoralways passes"jpg"directly. Consider removing the unused attribute to avoid confusion.🧹 Proposed cleanup
def __init__(self, imagespec): self.extensions = ["jpgs", "mp4s", "videos"] - self.extensions_mapping = {"jpgs": "jpg", "mp4s": "jpg", "videos": "jpg"} self.image_handler = imagehandler(imagespec)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/megatron/bridge/recipes/qwen_vl/data/energon/task_encoder.py` at line 216, Remove the unused attribute self.extensions_mapping from the TaskEncoder class: locate where it is defined (self.extensions_mapping = {"jpgs": "jpg", "mp4s": "jpg", "videos": "jpg"}) and delete that line since _convert_to_tensor now always passes "jpg" directly and imagehandler no longer relies on it; ensure no other code references extensions_mapping (search for extensions_mapping) before committing.
219-231: Add type hints for the new method.Per coding guidelines, functions should have type hints for arguments and return types. This improves readability and enables static type checking.
💡 Suggested type hints
- def _convert_to_tensor(self, data): - """Convert numpy array or bytes to tensor. - - The wds conversion script stores images as numpy arrays (HWC, uint8), - so we need to handle both numpy arrays and raw bytes. - """ + def _convert_to_tensor(self, data: np.ndarray | bytes) -> torch.Tensor: + """Convert numpy array or bytes to tensor. + + The wds conversion script stores images as numpy arrays (HWC, uint8), + so we need to handle both numpy arrays and raw bytes. + + Args: + data: Image data as numpy array (HWC, uint8) or raw bytes. + + Returns: + Tensor in CHW format with float32 values in [0, 1]. + """ if isinstance(data, np.ndarray):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/megatron/bridge/recipes/qwen_vl/data/energon/task_encoder.py` around lines 219 - 231, Add type hints to _convert_to_tensor: annotate the data parameter as Union[np.ndarray, bytes] and the return type as torch.Tensor (i.e., def _convert_to_tensor(self, data: Union[np.ndarray, bytes]) -> torch.Tensor). Also ensure typing Union is imported (from typing import Union) and that image_handler's return type is compatible with torch.Tensor; update image_handler signature or cast its result to torch.Tensor if needed. This change applies to the _convert_to_tensor method in task_encoder.py and any related image_handler definition used by this method.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/megatron/bridge/recipes/qwen_vl/data/energon/task_encoder.py`:
- Line 216: Remove the unused attribute self.extensions_mapping from the
TaskEncoder class: locate where it is defined (self.extensions_mapping =
{"jpgs": "jpg", "mp4s": "jpg", "videos": "jpg"}) and delete that line since
_convert_to_tensor now always passes "jpg" directly and imagehandler no longer
relies on it; ensure no other code references extensions_mapping (search for
extensions_mapping) before committing.
- Around line 219-231: Add type hints to _convert_to_tensor: annotate the data
parameter as Union[np.ndarray, bytes] and the return type as torch.Tensor (i.e.,
def _convert_to_tensor(self, data: Union[np.ndarray, bytes]) -> torch.Tensor).
Also ensure typing Union is imported (from typing import Union) and that
image_handler's return type is compatible with torch.Tensor; update
image_handler signature or cast its result to torch.Tensor if needed. This
change applies to the _convert_to_tensor method in task_encoder.py and any
related image_handler definition used by this method.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 598a6015-fbb7-4a34-ac42-89d0aba24cdd
📒 Files selected for processing (2)
examples/models/vlm/qwen3_vl/README.mdsrc/megatron/bridge/recipes/qwen_vl/data/energon/task_encoder.py
Background
The data loading logic for Energon-format datasets is not compatible with the updated Qwen3-VL finetuning pipeline.
Recent updates changed the expected multimodal sample structure, which causes mismatches when loading Energon datasets.
Changes
qwen3_vl_bridge.pyNotes
This change focuses on compatibility with the new Qwen3-VL finetuning logic and does not modify the Energon dataset format itself.
Tested on Qwen3-VL-8B-Instruct model.
Summary by CodeRabbit
Documentation
Bug Fixes